Video Fourier Transforms using PureData
نویسنده
چکیده
Fourier Transforms have been available for PureData for a long time. However they have never been applied to pixels, not ounting the possibility that someone onverted an image to sound in order to use [ t~℄ on it, whi h is fastidious and is still just a one-dimensional transform. This paper exposes the possibilities of using Fourier Transforms on whole images and on sequen es of them that enables a new palette of e e ts that omes mostly from the many variations on the theme of applying the Convolution Theorem. These e e ts an be applied on realtime video, provided that the video stream uses su iently little bandwidth. This paper overs implementation issues both at generi and pd-spe i levels and overs various e e ts that an be made using the [# t℄ obje t lass. 1. IMPLEMENTATION ISSUES 1.1 Number Field To use Fourier methods, one needs a ve tor spa e, and to have a ve tor spa e, one needs a number eld1, su h as the Real numbers, the Algebrai numbers, or the Rational numbers. For use with omputers, true number elds are unwieldy, so approximations are in use, su h as oatingpoint and xed-point2. Fixed-point doesn't go well with Fourier methods, be ause Fourier omponents usually have values at mu h varying s ales. However, video is usually all done in xed-point, even in PureData and even though PureData is that mu h oating-point-oriented and that little xed-point-oriented. Both GEM and PDP are devoid of any oating-point image formats, and even though GridFlow has supported oating-point images for over 4 years, little has been done using that feature. 1Fren h orps, German körper 2a number systemmade of fra tions with xed denominator; this is ompletely unrelated to iterative numeri al methods, unlike the other meaning of the same word. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X X-XXXXX-XX-X/XX/XX Naturally, there's an in entive towards nding a way to adapt FFT to xed-point. One way is to use a high-pre ision xed point, whi h allows for high-amplitude low-frequen y omponents without lipping, and low-amplitude high-frequen y omponents without mu h quantization. Another way to ompensate is to s ale FFT omponents a ording to their frequen y, just like how the energy of a sound has to do with the speed of the speaker membrane rather than its position, so in a typi al blend of bass and treble meant to be heard, treble will operate on a mu h smaller s ale of speaker positions. This an be done either by multiplying ea h omponent by its frequen y, or by applying FFT on the derivative of the signal instead of the signal itself, but a lot of are has to be put into balan ing quantization and the risk of over ow. No matter how it's done, a spe ial ase has to be made for frequen y 0 (DC). Fortunately, it is now possible to onsider using oatingpoint for all video omputations, given how fast oating point is in modern pro essors, if one is willing to a ept 96 bits-per-pixel. GridFlow already defaults to that data rate (using integers instead), so from that perspe tive, making the jump towards oating-point is not a matter of RAM bandwidth. The FFTW3 library is used by re ent versions of GridFlow to provide [# t℄, PureData's only image-based FFT. Note that all integer-based number elds that are not approximative wouldn't work with this kind of problem beause they rely on modulo arithmeti , whi h introdu es number relationships that have nothing to do with what we are trying to model. 1.2 Algebraic Extension Fourier Theory is more understandable using Complex numbers than non-Complex numbers, be ause introdu ing the on ept of square root of negative numbers reveals hidden relationships between on epts that seem very di erent in non-Complex numbers, making Complex mathemati s less omplex than non-Complex mathemati s in some respe ts (but saying this doesn't make people any less s ared of Complex numbers). However, there are some new omplexities introdu ed by them. A Complex number needs to be represented by a pair of non-Complex numbers, usually alled real and imaginary , whi h in the ontext of Fourier omponents, may be alled osine and sine instead (respe tively). This duality has to be implemented in some way in PureData. For the built-in [ t~℄, this is done using two signals. If [ t~℄ gets a single signal, the positive and negative frequen ies will mirror ea h other in a redundant way. Be ause of this, there are [r t~℄ and [ri t~℄ whi h skip over the redundant pro essing at the level of FFT itself, but à priori the intervening frequen y-domain obje ts don't know about the situation and so an't get any faster. To really eliminate all work done be ause of negative frequen ies, [r t~℄ and [ri t~℄ would have to support two di erent sample rates at the same time, or else nd a way to merge both signals together (but those merged signals would only be realisti ally pro essable by new spe ial obje ts designed to work with them). So maybe we have to a ept this as a fa t of life and just get used to always pro ess signals in pairs or waste half of operations on zeroes getting dis arded. For grids, however, the equivalent of sample rate an get as variable as desired, and so an the logi al organisation of data. A real FFT ould be performed on a grid of size 256 by 256 by 3, outputting a pair of grids of size 256 by 128 by 3, where the dupli ate frequen ies have been removed (In 2-D, still only half of the values are wasted, whi h is why I'm not suggesting 128 by 128). In pra ti e this is further ompli ated by the fa t that DC and Nyquist don't really belong in the same bin (in 2-D, a olumn of bins), apart from being the two omponents of a omplex FFT that don't have a sine part. If they were put in the same bin, one of them would have to pretend to be a sine. Due to this ex lusion prin iple, 256 by 129 would be more appropriate, and if we wanted mu h alignment, 256 by 136 would be appropriate (and so would be 129 by 256). For [# t℄, I have de ided to use a single grid instead of two. This turns out to be useful for avoiding pro edures of interleaving and deinterleaving required by algorithms that want data in an interleaved fashion. Be ause of the order of dimensions, this grid's size an't be 2 by 256 by 256 by 3, it has to be 256 by 256 by 3 by 2 be ause the interleaving happens in the last dimension. Here, the 3 by 2 may be thought of as ea h hannel of a bin having two sub hannels (e.g. osine green vs sine green); it may also be thought of as having 6 hannels, and indeed it's sometimes useful to ast this to 256 by 256 by 6. I have avoided writing [#r t℄ and [#ri t℄ on the grounds that this optimisation would be premature, just like eliminating dupli ate frequen ies would also be premature. So far, the best use of resour es with minimum omplexity is to [#redim℄ so that a 4hannel system be omes a 2-by-2hannel system, wherever the frequen y-domain operations don't need the hannels to be absolutely separate (e.g. uses of the Convolution Theorem). 1.3 Vector Space After Complex numbers have been a epted, a ve tor spa e needs to be made using them. There is a need to deal with the fa t that we have several pixels and that they are going to be onverted into frequen y bins. Complex numbers are taken to be the s alar eld whi h is raised (by artesian produ t) to a power equal to the number of dimensions, whi h in the ase of sound is the blo k size. Here a dimension is synonymous with a degree of freedom. In the ase of a single image, the number of dimensions is the number of pixels, but that is not all that is determining the stru ture of the ve tor spa e: the way that the FFT applies itself on the dimensions depends of relationships between the dimensions. In this ase, the dimensions are organised in rows, olumns and hannels; that organisation ould be alled the dimensions of dimensions. The two-dimensional FFT sought after is a tually a pair of FFT, one of whi h is along rows and the other is along olumns. Ea h of those an be seen as many smaller FFT that operate on s alar omponents, or as one huge FFT that operates on large ve tor omponents (whole rows and whole olumns of pixels).3 In a 2-D FFT, the on ept of frequen y is somewhat different, be ause frequen y itself be omes ve torial 4. A frequen y is then a pair of a row frequen y with a olumn frequen y, the latter two being integers. For instan e, there is one parti ular bin for a pi ture of size (240 320) represents the harmoni that has frequen y (40 64), meaning that its row frequen y is 40 y les per 240 pixels (one sixth of a y le/pixel) and its olumn frequen y is 64 y les per 320 pixels (one fth of a y le/pixel). It is possible to talk about diagonal frequen ies and the angles of frequen ies, but the pre ision and relevan e of this on ept is limited when the number of possible frequen ies is nite: the pixel grid, aka square latti e , is obviously not rotation independent, exept at right angles, else no-one would ever need anti-aliasing of diagonal lines. FFT tends to be fastest when the number of bins fa torizes well. The lassi algorithm works in an amount of time roughly proportional to the sum of the prime fa tors of the number of bins (repeated fa tors in luded). This en ourages the number of bins to be a power of two. PureData's FFT works only on powers of two, but when it's time to do FFT on images, without any spatial blo k size (unlike JPEG, whi h has a 8-by-8 blo k size), then there's a serious need for padding before starting to do anything with power-of-two sizes, and if wanting to work in wrap-around oordinates, it's just impossible to pad. This means that non-power-oftwo FFT has to be supported. Fortunately, FFTW does, for small prime fa tors (nowadays there exist variants of FFT for larger prime fa tors but they are not ne essary here, as already enough possible widths are possible with small primes, even onsidering the additional restri tions that FFTW puts on this and that I'm not explaining). There is never any FFT done along the hannels dimension, be ause it's almost always useless to do that. For it to make sense, there has to be many hannels and they have to form a nite ring in some intuitive way (but let's not get into that...). There is rarely any FFT done along the time dimension. That dimension is usually not apparent in GridFlow be ause ea h image is a separate grid, but grid transmissions may be reframed using [#import℄ as a very large grid with a time dimension. FFT along the video time dimension fa es the same problems as sound does, but in slow motion (and thus mu h more noti eably) be ause (among other reasons) the sample rate is over a thousand times lower. MPEG doesn't do any FFT along the time dimension. Currently, [# t℄ doesn't support the time dimension. 2. APPLICATIONS 3This is said onsidering that a stru ture like a matrix or a tensor or a grid is in the end a ve tor, when seen by plain ve tor operations. Other things, like matrix produ t and the Einstein notation, use the organisation of dimensions where plain ve tor operations don't. 4Not in the exa t sense of Linear Algebra, be ause indi es don't usually form a eld, and we don't want them to, and that's perfe t be ause we don't need a true ve tor spa e as long as we don't need a well-de ned exa t division operator. 2.1 Fast Convolutions There exist fast onvolution algorithms for spe ial onvolution kernels. For example, a 15-by-15 square blur may be de omposed into the equivalent ombination of a 15-by-1 re tangle blur and a 1-by-15 re tangle blur. Another example: a blur with kernel (1 2 3 4 5 4 3 2 1) an be de omposed into two identi al blurs with kernel (1 1 1 1 1). De omposition doesn't have to be by omposition: if a kernel is (1 2 3 4 99 4 3 2 1) it an be de omposed as a sum of (94) and the previous kernel. However, it's not so easy to gure out an e ient de omposition. They're normally devised by human intervention and only for spe ial ases. There are several ways to apply large re tangle blurs in the same time that small ones an be applied. Kernels that have lots of zeroes an be partially optimised out (whi h GridFlow does but the others don't). There are plenty of tri ks and they still work only for parti ular kernel patterns. FFT an be used to ompute onvolutions with any kernel in the same amount of time. The kernel has to be zeropadded so that it is as big as the pi ture itself. Typi ally, if you are onvolving with a 15x15 kernel the ordinary way, it's possible that three FFTs and a Complex produ t together are faster than that. It's also possible with something as small as a 11x11 kernel. The threshold at whi h FFT beomes more e ient than dumb onvolution, is something that depends on the speed of both, but su h a threshold always exist. Fast onvolution means the ability to make impressive large-s ale blurs. Generally speaking, a blur is a low-pass lter. It an also be used for making edge-dete tion, whi h is high-pass lter, but it's less useful at that, be ause edgedete tion tends to involve only very small areas so they only need very small kernels anyway, and then they usually need onversion ba k to spa e-domain be ause of non-linear operations that have to be done on them afterwards. For example, in omputing a polar Sobel, one has to ompute the magnitude from the lo al horizontal and verti al ripples around every pixel position: this is a spa e-domain operation. If we multiply the horizontal ripples with themselves from a spa e-domain point of view, but the data is in spatialfrequen y-domain, then a mirror image of the usual Convolution Theorem applies, and so this operation is equivalent to onvolving that spe trum with itself, whi h so abominably slow that it an only mean that the spe trum should be onverted ba k to spa e-domain. 2.2 Notch Filter Design Cameras are subje t to ele tromagneti interferen e due to unsu ient (or ompletely absent) shielding in their sensor. Images s anned from magazines may show moiré e e ts due to interferen e between the pixel grid and the grid of the dithering. In both ases the image an be smoothed using a not h lter, supposing that the pi ture does not feature patterns that oïn ide with the interferen e in terms of frequen ies (in luding the dire tions of the ripples). In 2-D, not hes may be more omplex than just intervals. There are many moral equivalents of intervals: • re tangle-shaped holes ( artesian produ ts of intervals) and rhombus-shaped holes (a sheared re tangle hole) • a disk-shaped hole (all points within a ertain radius of a point) or an ellipti hole (a sheared disk made by slightly hanging the de nition of distan e). This is what the Q-fa tor orresponds to in 2-D. In the ase of the ir le, the Q-fa tor is still a single number, for for the ellipse the Q-fa tor is a matrix. • a ir ular band entered at the origin, in whi h ase it is an anisotropi not h lter, whi h is a 1-D lter radially onverted to 2-D. • some in nite band with straight parallel borders, in whi h ase it's a 1-D lter linearly aligned in a spe i dire tion. • a more spe ial 1-D lter is the one that lters out some angles. This is an in nite-size pa man. A nite-size pa man is a low-pass lter and an angle-not hlter at on e. To ea h of those lters, as usual, orresponds some a kind of noise that this lter is good at getting rid of. In ea h ase, the not h should be dupli ated rotated 180 degrees around the origin. This is the same as making a double re e tion, one along the x and one along the y. 2.3 Deconvolution Just like onvolution is a multipli ation of spe tra, deonvolution is division of spe tra. Proper de onvolution is hard to do. First the kernel has to be invertible. This an be he ked by looking at the spe trum of the kernel: it must not ontain any zeroes, but even then, it must not ontain values lose to zero, else quantization e e ts will be extremely apparent. Quantization e e ts in general very qui kly kill hopes of doing de onvolution the simple way. 2.4 Semiconvolution It is possible to nd whi h are the kernels that when applied twi e have the same e e t a given kernel. This an be done by Complex square root, whi h an be done as division by 2 in the log domain or the epstrum. Other fra tional onvolutions may be performed too, by using other onstants. 2.5 Crosscorrelation A kind of similarity omparison may be done by superimposing the images in all possible ways, ea h time measuring the brightness of the two multiplied images. This is done by onvolving an image with a mirror of the other image. That mirror is a tually a double mirroring (both x and y) so it's again the same as a 180-degree rotation. The impa t on the spe trum is to hange the sign of the sine omponents ( onjugation). Crossorrelation of an image with itself is alled auto orrelation; it dete ts self-similarity in an image. Be ause this is a similarity of translation only (and not rotation nor s aling) it will dete t only the periods and dire tions of periodi patterns, and not of other patterns. I use ross orrelation to dete t and measure motion of the amera (or of the whole s enery) in terms of distan es and dire tion (whi h is mu h di erent from other things that are alled motion dete tion in GridFlow). 2.6 Fourier Interpolation A spe trum may be extended using zeroes so that it beomes the spe trum of a bigger image. Then an inverse FFT will yield that bigger image, whi h will be exa tly bandlimited, unlike all other interpolation methods. This isn't ne essarily the most desirable, as di ra tion patterns appear around any sharp edges in the image (this is alled the Gibbs phenomenon in math, but it's equivalent to the di ra tion patterns that are observed in physi s). 2.7 Spectrum of common convolution kernels Using a single Fourier transform, one an nd the spe trum of that kernel in the ontext of a ertain size of image (ea h size will yield a di erent spe trum). This an help understanding ommon FIR lters in FFT terms. Here are some examples. A re tangle blur is a low-pass lter that has a di ra tion pattern in its spe trum (for re tangle sizes above 2). Binomial blurs have oe ients (1 1) or are made by omposing those oe ients with themselves, e e tively yielding rows from Pas al's triangle; those blurs have no di ra tion patterns in their spe tra, hen e they are smoother in some way. The (-1 2 -1) kernel is an edge dete tor, that is, a high-pass lter. Kernels that are symmetri (palindromi ) do not introdu e any phase delay (that is, the phase delay is either 0 or pi). Phase delay has to be understood in image terms, whi h is that the ontents of an image appear to have moved in a spe i dire tion and for a spe i distan e when the lter is applied.
منابع مشابه
Measurement of Plain Weave Fabrics Density Using Fourier Transforms
Warp and weft spacing and its coefficient of variation affect the physical properties of fabrics such as fabric hand, frictional and mechanical properties. In this paper the weft and warp spacing and its coefficient of variation for plain weave is calculated using Fourier transforms. Different methods have been used in this work including autocorrelation function. First, two dimensional power s...
متن کاملMeasurement of Plain Weave Fabrics Density Using Fourier Transforms
Warp and weft spacing and its coefficient of variation affect the physical properties of fabrics such as fabric hand, frictional and mechanical properties. In this paper the weft and warp spacing and its coefficient of variation for plain weave is calculated using Fourier transforms. Different methods have been used in this work including autocorrelation function. First, two dimensional power s...
متن کاملComprehensive Performance Comparison of Fourier, Walsh, Haar, Sine and Cosine Transforms for Video Retrieval with Partial Coefficients of Transformed Video
The desire of better and faster retrieval techniques has always fuelled to the research in content based video retrieval (CBVR). The extended comparison of innovative content based video retrieval (CBVR) techniques based on feature vectors as partial coefficients of transformed video frames using various orthogonal transforms is presented in the paper. Here the popular transforms are considered...
متن کاملRapid Non-Cartesian Regularized SENSE Reconstruction using a Point Spread Function Model
Synopsis Iterative reconstructions of undersampled non-Cartesian data are computationally expensive because non-Cartesian Fourier transforms are much less e cient than Cartesian Fast Fourier Transforms. Here, we introduce an algorithm that does not require non-uniform Fourier transforms during optimization iterations, resulting in large reductions in computation times with no impairment of imag...
متن کاملPathologies cardiac discrimination using the Fast Fourir Transform (FFT) The short time Fourier transforms (STFT) and the Wigner distribution (WD)
This paper is concerned with a synthesis study of the fast Fourier transform (FFT), the short time Fourier transform (STFT and the Wigner distribution (WD) in analysing the phonocardiogram signal (PCG) or heart cardiac sounds. The FFT (Fast Fourier Transform) can provide a basic understanding of the frequency contents of the heart sounds. The STFT is obtained by calculating the Fourier tran...
متن کاملFractional Fourier transforms of hypercomplex signals
An overview is given to a new approach for obtaining generalized Fourier transforms in the context of hypercomplex analysis (or Clifford analysis). These transforms are applicable to higher-dimensional signals with several components and are different from the classical Fourier transform in that they mix the components of the signal. Subsequently, attention is focused on the special case of the...
متن کامل